MLB Population Averages (2020 - 2022)

Note Recategorizations:
“Sliders” includes Slurves and Sweepers
“Curveballs” includes Knuckle Curves
“Changeup” includes Splitters


Pitch Horizontal Vertical Pitch Proportion Spin Rate
4-Seamer 7.45 14.86 0.46 2285.29
Changeup 14.03 32.27 0.26 1754.87
Curveball 9.45 53.35 0.26 2572.18
Cutter 2.88 25.97 0.34 2380.57
Sinker 15.00 22.89 0.39 2127.16
Slider 6.42 36.28 0.34 2432.44
Splitter 11.71 33.09 0.28 1459.77

Pitch-by-Pitch Breakdowns:


Variable Importance Plots

Sinkers, Cutters, and Four-Seam Fastballs have been aggregated into a singular “Fastball” category.

wOBA

Sliders

Fastballs

Curveballs

Change-Ups

xwOBA

Sliders

Fastballs

Curveballs

Change-Ups

Run Value / 100

Sliders

Fastballs

Curveballs

Change-Ups

Decision Trees

wOBA

Sliders

Four-Seams

Cutters

Sinkers

Curveballs

Change-Ups

xwOBA

Sliders

Four-Seams

Cutters

Sinkers

Curveballs

Change-Ups

Run Value / 100

Sliders

Four-Seams

Cutters

Sinkers

Curveballs

Change-Ups

Pitch Characteristics

wOBA

Sliders

Fastballs

Curveballs

Change-Ups

xwOBA

Sliders

Fastballs

Curveballs

Change-Ups

library(caret)
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
test_data <-  Data2 %>% 
  filter(pitch_type == "SL") %>% 
  filter(!is.na(spin_rate), !is.na(speed_diff)) %>% 
  select(pitcher_id, woba, avg_speed, spin_rate, pitcher_break_x, pitcher_break_z, speed_diff)

train_pitcher_id <- test_data %>%
  distinct(pitcher_id) %>%
  slice_sample(prop = 0.7)

test_train <- test_data %>% 
  filter(pitcher_id %in% train_pitcher_id$pitcher_id)

test_test <- test_data %>% 
  filter(!pitcher_id %in% train_pitcher_id$pitcher_id)



test_tune_grid <- expand.grid(mtry = 1:4,
                              splitrule = "variance",
                              min.node.size = 5)

test_caret <- train(woba ~., data = test_train, method = "ranger",
                    num.trees = 1000, 
                    trControl = trainControl(method = "cv", number = 5),
                    tuneGrid = test_tune_grid)

test_test %>% 
  mutate(Prediction = predict(test_caret, test_test)) %>% 
  ggplot(aes(x = woba, y = Prediction)) +
  geom_point()